216 PART 5 Looking for Relationships with Correlation and Regression
For curves, finding the best-fitting curve is a very complicated mathematical
problem. What’s nice about the straight-line regression is that it’s so simple that
you can calculate the least-squares parameters from explicit formulas. If you’re
interested (or if your professor insists that you’re interested), we present a gen-
eral outline of how those formulas are derived.
Think of a set of data containing Xi and Yi, in which i is an index that identifies
each observation in the set, as described in Chapter 2. From those data, SSQ can be
calculated like this:
SSQ
a
bX
Y
i
i
i
(
)2
If you’re good at first-semester calculus, you can find the values of a and b that
minimize SSQ by setting the partial derivatives of SSQ with respect to a and b
equal to 0. If you stink at calculus, trust that this leads to these two simultaneous
equations:
a N
b
Y
Y
(
)
(
)
(
)
a
X
b
X
XY
(
)
(
)
(
)
2
where N is the number of observed data points.
These equations can be solved for a and b:
a
Y
X
X
XY
N
X
X
(
)(
)
(
)((
)
(
)((
)
(
)
2
2
2
b
XY
a
X
X
(
)
( )(
)
(
)
2
See Chapter 2 if you don’t feel comfortable reading the mathematical notations or
expressions in this section.
Running a Straight-Line Regression
Even if it is possible, it is not a good idea to calculate regressions manually or with
a calculator. You’ll go crazy trying to evaluate all those summations and other
calculations, and you’ll almost certainly make a mistake somewhere in your
calculations.